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Abstract 

In 2009, two different groups independently explored the behav- 
ior of random threshold graphs. Here, we extend their techniques to 
find the distribution of other properties, including matching number, 
degeneracy, and length of the longest cycle. 

1 Introduction 

An undirected graph G is a threshold graph if there exists some real-valued 
function w that assigns weights to the vertex set V(G) such that two vertices 
u, v are adjacent if and only if w(u) + w(v) exceeds some threshold t. These 
graphs, first defined in 1973 by Chvatal and Hammer p], also have several 
other equivalent characterizations, which led to their occasional "rediscovery" 
through the following two decades. 

In 2009, it was independently shown by Reilly and Scheinerman [7], as 
well as by Diaconis, Holmes, and Janson [3J, that the method of generating 
random threshold graphs by choosing n vertex weights uniformly on [0, 1] 
(with t = 1) was in fact uniform on the set of all n- vertex threshold graphs. 
The two teams then used this equivalence to find properties ranging from 
the distribution of the number of isolated vertices to the likelihood of Hamil- 
tonicity. 

Here, we take their results and extend them, using the encodable nature 
of threshold graphs to determine the distributions and likelihoods of other 
graph invariants. 
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2 Basics 



One of the many equivalent characterizations of threshold graphs is that they 
can be constructed from a single vertex by repeatedly adding an isolated 
vertex or a dominating vertex [21 E] . So a threshold graph on n vertices is 
completely determined by this record of n — 1 additions; if we mark a for 
the addition of an isolated vertex and a 1 for the addition of a dominating 
vertex, we get a binary sequence which is known as the creation sequence. 
(This definition, drawn from [7J, is equivalent to the binary code defined in 
[3]. It differs slightly from the creation sequence definition of (5], which is 
closer to the extended binary code of [3], as both allocate an extra digit for 
the original single vertex.) 

Given a threshold graph G with n vertices, we let seq(G) denote the 
(n — l)-digit creation sequence of G. Conversely, given a binary string s of 
length n, 7(3) is the unlabeled threshold graph G such that seq(G) = s. 
From this, we see that the number of n-vertex threshold graphs is exactly 
2 n_1 . 

These properties suggest two natural methods for random generation of 
a threshold graph with n vertices. The first is to choose the n weights inde- 
pendently and uniformly from [0, 1] with threshold t — 1, and let G denote 
the unlabeled threshold graph induced by the weights; in this model, the 
probability of any particular edge u, v being in the graph is exactly 1/2. Al- 
ternatively, we can select G uniformly at random from the set of all 2 n ~ l 
possible graphs of given size. 

A critical result in the study of random threshold graphs was the 2009 
proof that these two methods have the same distribution. That is, via in- 
dependent arguments, collaborations between Diaconis, Holmes, and Janson 
[3] and between Reilly and Scheinerman [7J showed that given an unlabeled 
threshold graph G with n vertices, the probability of G arising via the ran- 
dom selection of vertex weights drawn independently and uniformly from 
[0, 1], with threshold t — 1, is equal to 2 1_n . 

As a consequence, when examining random threshold graphs, we can 
discard the continuous random variables involving vertex weights and restrict 
ourselves to the discrete creation sequences. So to compute the probability 
P of random threshold graph G having a certain invariant, we first find the 
properties of the creation sequence necessary and sufficient to evoke such 
behavior in G. Then P will be proportional to the number of (n — l)-long 
binary sequences with such properties, a number that is usually easier to 
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count. 

With the exception of that single initial vertex, also known as the base 
vertex, every element of V(G) can be classified according to its digit in seq(G). 
We call the others zero- vertices or one- vertices, depending upon whether the 
corresponding digit is zero or one, respectively. Furthermore, we use the 
creation sequence to refer to specific vertices, saying that a vertex has index 
i if it corresponds to the i-th digit in the creation sequence, reading from 
left to right. (The base vertex has index 0.) Note that this enumeration is 
a by-product of the structure of the graph, as opposed to an independent 
labelling. 

As a consequence of this construction, the relationship between any two 
vertices can be completely determined by their corresponding digits and their 
relative indices. Since one-vertices dominate all existing vertices at the time 
of their addition, and zero-vertices are isolated from all existing vertices, 
two vertices are adjacent if and only if the vertex of higher index has a 
corresponding digit of 1. So no two zero- vertices are adjacent, but all one- 
vertices are adjacent to each other, as well as to the base vertex. 

Proposition 2.1. For any two threshold graphs G and H , G is an induced 
subgraph of H if and only if seq(G) is a subsequence of seq(H) . 

Proof Letting seq(G) = sis 2 . . . s m and seq(if) = t\t 2 . . . t n , suppose that 
there exist ji < j 2 < ■ • • < j m such that seq(G) = t^tj 2 . . .tj m . For any 
u, v e V(G) of index i u and i v , let u',v' e V(H) be those vertices of index 
j iu and ji v . Then as the corresponding digits of u' and v' are equal to, and 
in the same order as, those of u and v, the former are adjacent in H if and 
only if the latter are adjacent in G. Thus, G must be an induced subgraph 
oiH. 

For the converse, it suffices to prove the claim for when G is induced 
by the removal of a single vertex of H; repeated application produces all 
other cases. Let G be the induced subgraph of H formed by the removal of 
vertex v, and consider the construction of seq(G). The removal of v does 
not change the classification of any vertices: if vertex u is isolated in H, it 
remains isolated in G. Similarly, if u dominated all lower-index vertices in 
H, it dominates all of those vertices that remain in G. So when forming 
seq(Gr), the vertices can be taken in the same order as when forming seq(H), 
without changing their corresponding digits. □ 

Given a sequence s = sis 2 ■ ■ ■ s n , we let the k-ih tail of s be the subse- 
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quence consisting of the last k digits of s: s n -k+i s n-k+2 ■ ■ ■ s n . In this vein, 
we define Zk(s) and Uk{s) to be the numbers of zeros and ones in the fc-th 
tail, respectively. 

We define a function h on the set of all finite binary sequences by, for any 
such sequence s, 

h(s) = max {z k (s) - u k (s)} 

0<k<\s\ 

So h(s) is the maximum count, across all tails of s, by which the number 
of zeros exceeds the number of ones. Note that h is always non-negative, as 
the case k = corresponds to the empty tail, in which there are no digits of 
either type. 

To make full use of h, we must first find its distribution. 
Proposition 2.2. For a random threshold graph G on n vertices, 

P(h(se q (G)) = k) = Q)" \ n ^) 

Proof. Using the uniformity of the distribution, we see that the probability of 
h(seq(G)) = k is proportional to the number of (n— l)-long binary sequences 
that have a tail with k more zeros than ones, but where no tail has k + 1 
more zeros than ones. 

To count the number of such sequences, we read the creation sequences 
from right to left, and interpret the digits as moves within an integer lattice. 
Starting at the origin, we move a single unit upwards whenever we encounter 
a zero, and rightwards whenever we encounter a one. In this framework, a 
tail with m + k zeros and m ones produces a "staircase walk" from the origin 
to the point (to, to + k). 

So the number of (n — l)-long sequences s such that h(s) = k equals the 
number of staircase walks that touch, but do not cross, the line y = x + k. 
Thus, 

□ 

3 Planarity 

By Kuratowski's Theorem, G is planar if and only if it does not contain 
a subgraph that is a subdivision of K 5 or 1^3,3. As such, we will use the 
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following result which shows exactly when G has K 5 as a subgraph: 



Lemma 3.1 (Reilly, Scheinerman) . For a threshold graph G, the size of the 
maximum clique is one more than the number of one-vertices. 

Proposition 3.2. A threshold graph G with s = seq(G) is planar if and only 
if s contains no subsequence of the form 1111 or 00111. 

Proof. We shall show that the existence of Kuratowski's offending subgraphs 
in G is equivalent to the existence of the above subsequences in s. 

First, suppose that s contains a subsequence of the form 1111. Then 
by Lemma 13. 1\ G contains K 5 as an induced subgraph. Alternatively, if s 
contains some 00111, then there must exist three one-vertices that are each 
adjacent to three other vertices: the vertices corresponding to the two zeros 
and the base vertex. Thus, G contains K^^ as a subgraph. In either of these 
cases, we see that G is non-planar. 

Inversely, suppose that s contains no such subsequences, which leads to 
two subcases: either there are at most two ones in s, or there exist exactly 
three ones, one of which has index at most two. In the former event, there 
are at most two vertices of degree exceeding two. Since subdividing does 
not increase the degree of existing vertices, no subdivision of any subgraph 
can have three vertices of degree three or more, eliminating K 5 and K 3 3 as 
possibilities. 

Similarly, if G contains exactly three one-vertices, one of which has index 
at most two, then said one- vertex has degree at most four. There are at most 
five vertices of degree three or more: the three one-vertices, and any other 
vertices of lesser index, of which there are at most two. Thus, no subgraph 
subdivides into ^3,3. And as the base vertex and the zero- vertices can have 
degree at most three, a subdivision into is also impossible. Thus, G must 
be planar. □ 

Theorem 3.3. For a random threshold graph G with n vertices, 



Proof. Letting s = seq(G) = sis 2 . . . s„_i, we see that by Proposition 13. 2\ G 

is planar if and only if s contains at most two ones, or s has three ones, one 



P(G is planar) 
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of which must be Si or s 2 . By counting the number of such sequences, the 
probability of the former event is 




For the latter event, we further subdivide into the disjoint events {s\ = 1} 
and {s\ = 0,S2 = 1}, which have a combined probability of 




4 Matching Number 

Reilly and Scheinerman found the probability of a random threshold graph 
having a perfect matching. Here, we explore the distribution of the matching 
number v(G), the number of edges in a maximum matching. 

Lemma 4.1 (Reilly, Scheinerman). A threshold graph G with an even number 
of vertices contains a perfect matching if and only if h(seq(G)) = 0. 

Corollary 4.2. A threshold graph G with an odd number of vertices has a 
near-perfect matching if h(seq(G)) = 0. 

Proof Letting seq(G) = sis 2 . . . Sk, define subsequence s' by s' = S2S3 • • • Sfc- 
Then h(s') = 0, as every tail of s' is also a tail of seq(G). Since 7(5') is a 
threshold graph with one fewer vertex than G, it has a perfect matching, 
which is also a matching in G. □ 

Proposition 4.3. For a threshold graph G with n vertices, 

n — h(seq(G)) 
2 _ 

Proof. The case h(seq(G)) = having already been handled, we assume 
h{seq{G)) > 1. Let s = seq(G). 

As 2u(G) is the maximum number of vertices in a matching, n — 2u(G) 
is the minimum across all matchings of the number of unpaired vertices. 



!/((?) = 
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Let m denote a maximizing index for h(s), so that h(s) = z m (s) — u m (s). 
Then there are h(s) more zeros than ones amongst the final m digits. As 
zero- vertices are adjacent only to one- vertices of higher index, those vertices 
corresponding to zeros in the tail can only be adjacent to one- vertices in the 
same tail, so there are at least h(s) vertices that cannot participate in any 
matching. Thus, n — 2v(G) > h(s). 

Next, let us define a new binary sequence s' by removing the h(s) right- 
most zeros from s. Then h(s') = 0, so by Lemma 14.11 the threshold graph 
7(3'), which has n — h{s) vertices, has a matching of size \_(n — h(s))/2\ . And 
as 7(V) is an induced subgraph of G, v{G) > 2/(7(5)). □ 

Having determined the properties of the creation sequence responsible for 
a matching number of given size, we can compute its likelihood. 

Theorem 4.4. For a random threshold graph G with n vertices, 

P{v{G) = k)={ , , W A 

' 1\ In — 1 \ , n 



1 \ I n\ . n 



mj k 2 

Proof. Letting s denote seq(G), we note that since h can only assume integer 
values, 

P(u(G) = k) = P{n - 2k - 1 < h(s) < n - 2k) 

= P(h(s)=n-2k-l)+P(h(s) = n-2k) 



As h(s) must be non-negative, we see that for < k < n/2 
P(u(G) = k) = 



1\ / (n — 1 \ 1 n — 1 



k / U-l 



i \ 71—1 / 

1 \ fn 



2 \k 



As for k = n/2, we see that 

p(u(G) = ^)=P(h(s) = 0) 



z , \ n— 1 / , 

n\ _ /1\ in— 1 



LfJ 



□ 
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5 Longest Cycle Length 



For a graph G, let ip{G) denote the length of the longest cycle in G. For a 
graph G on n vertices, Reilly and Scheinerman found the probability that 
ip(G) = n, corresponding to the event in which G is Hamiltonian, through 
the following result: 

Lemma 5.1 (Reilly, Scheinerman). Let G be a threshold graph with n > 3 
vertices. Then G is Hamiltonian if and only if Uk{seq(G)) > Zk{seq(G)) for 
all 1 < k < n - 1. 

Corollary 5.2. Let G be a threshold graph with n > 3 vertices, and seq(G) = 
S1S2 • • • s n _x- Then G is Hamiltonian if and only if s n _i = 1 and h(siS2 ■ ■ ■ s n -2) — 
0. 

Here, we generalize to find the full distribution of i/j(G). We define, for a 
binary sequence s = Sis 2 ■ ■ ■ s&, the function r(s) by 

r(s) = max ({0} U {i : s t = 1}) 

That is, r(s) returns the index of the right-most one in s in the event that 
such exists, and zero otherwise. 

Proposition 5.3. For a threshold graph G with s = seq(G), if s contains at 
least two ones, then 

ijj{G) = r(s) + 1 - h(s'), 
where s' is the subsequence of s defined by s' = Sis 2 ■ ■ ■ s r 0)-i- 

Proof. First, note that this formulation, like that for v(G), can be expressed 
in terms of excluded vertices n — if>(G). To find the minimum of the number 
of vertices skipped by each cycle, we begin by excluding the isolated vertices, 
of which there are exactly (n — 1 — r(s)). 

As for the non-trivial connected component, which corresponds to the 
sequence s'l, let m be a maximizing index for h(s'), so that h(s') = z m (s) — 
u m (s). Then no cycle can contain more than 2u m (s') of the vertices corre- 
sponding to the last m digits. For if some cycle Y were to contain u m (s') + 1 
of the zero-vertices, then the threshold subgraph Hy C G induced by the 
vertices of Y would have a Hamiltonian cycle but a creation sequence where 
some tail contained as many zeros as ones, contradicting Corollary 15.21 So 
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at least h(s') of the zero-vertices in the connected component must also be 
excluded, and thus 

n - ifi(G) > (n - 1 - r(s)) + h(s') 

Let us define another binary sequence s" by removing the right-most h(s') 
zeros from s'; then h(s") = and \s"\ = r(s) — 1 — h(s'). Therefore 7(s"l), 
an induced subgraph of G, is a threshold graph on r(s) + 1 — h(s') vertices 
that contains a Hamiltonian cycle. Thus, ip(G) > r(s) + 1 — h(s'). □ 

Theorem 5.4. For a random threshold graph G on n vertices, 

Proof. By Proposition 15. 3[ in order for ip{G) to equal k, we require that 

r(s) - k + 1 = h(sis 2 ■ ■ ■ Sr( s )-i), 
where s = seq(G). So for 3 < k < n, 

n-1 

P(^(G) = k) = J2 p (r(s)=jMG) = k) 
i=i 

n-1 

= ^ p ( r (s) = j, r(s) - k + 1 = h(sis 2 . . . Sr( s )-i)) 
i=i 

n-1 

= p ( r ( s ) = i> i - k + 1 = Msis 2 • • • s i-i)) 
j=k-l 

Note that the two intersecting events are independent: the first depends 
only on the location of the right-most one in the sequence, whereas the second 
concerns all of the preceeding digits, and there is no overlap. And since each 
individual digit is chosen independently, the distribution of S1S2 ■ ■ ■ s r ( s )_i, 
conditioned on r(s), is uniform over all binary sequences of length r(s) — 1: 
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n-1 



P(i>(G) = k) = P ( r ( s ) = j) p (h{sis 2 • • • 8i - x ) — j — k + i) 

j=k-i 



= £ 



n-j \ j-l 



j=k-l 



2j-k-[^\ 



1 \ n 1 nl ( — 1 

2) 



j=k- 



n— 1 



n - 1\ (k -1 
LI J / V L!J 



□ 



6 /c-Core 

The /c-core of a graph G is the maximum induced subgraph H C G such 
that all vertices of H have degree at least k, formed by iteratively deleting 
all vertices with degree less than k. The degeneracy of G is the largest 
k such that the k-core of G is non-empty. An equivalent formulation for 
the degeneracy is the maximum, over all induced subgraphs H C G, of the 
minimum degree of a vertex in if. That is, 

degen(G) = max min deg(w) 

HCG v&V(H) 

These two concepts allow us to examine the density of our randomly 
generated graphs: 

Proposition 6.1. For a threshold graph G, degen(G) > d if and only if 
K d+1 C G. 

Proof. First, assume that G contains a subgraph isomorphic to Kd+\- Then 
letting H denote said subgraph, we see that every vertex has degree exactly 
d, and thus the degeneracy is at least d, by the second definition given above. 

Next, assume that G has degeneracy greater than or equal to d. Then G 
has a non-empty rf-core, and therefore there exists a subset V C V(G) in 
which every vertex of V' is adjacent to at least d other members of V' . Then 
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\V'\ > d+1, and since zero- vertices are adjacent only to one- vertices, V' must 
contain at least d of G's one-vertices. So G contains at least d one-vertices, 
and thus a clique of size d+ 1, as all one- vertices are adjacent to each other, 
as well as the base vertex. □ 

Corollary 6.2. For a threshold graph G, degen(G) = d if and only ifseq(G) 
contains exactly d ones. 



Corollary 6.3. For a random threshold graph G with n vertices, 
P(degen(G) = d) - 



1 \ n—l / 1 

1 \ (n — l 



2/ \ d 

Proposition 6.4. For a threshold graph G such that s = seq(G) contains at 
least k ones, a vertex v £ V(G) is in the k-core ofG if and only if deg(v) > k. 

Proof. Since G contains at least k one-vertices, G contains a non-empty k- 
core. If a vertex lies in the fc-core of G, then by definition it has a degree of 
at least k in that induced subgraph, and therefore a degree of at least k in 
G. 

Next, consider some vertex v such that deg(t> ) > k. If v is a one-vertex, 
then v is a vertex of some induced Kk+i, and thus part of the fc-core. On 
the other hand, if v is a base vertex or zero- vertex, then v is adjacent to k 
one-vertices in the /c-core, and thus part of the fc-core as well. □ 

Because of this, if a non-empty fc-core exists, then only one round of 
pruning occurs. Furthermore, the pruned vertices are all zero-vertices of 
degree less than k. 

Theorem 6.5. For a random threshold graph G on n vertices, 



P{\k - core{G) \ = j) 



k-l / 1 \ n-1 / , 

1 \ (n—l 



E 



2 \ i 



i=0 

1 \ n+k-j / , 

1\ f n + k — j 



2 V k-l 



J = u 
j > k + 1 



Proof. For the /c-core of G to be empty, the degeneracy of G can be at most 
k — 1. So by Corollary 16.31 the probability of having \k-core(G)\ = is 

fc-l k-l , , x n-1 / 

n—l 



J2 P (degen(G) = l ) = J2U) 
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In the non-empty cases, the event of the /c-core having exactly j vertices 
is the same as exactly n — j vertices being removed by the pruning process. 
These discards, being zero-vertices of low degree, can have at most k — 1 
one-vertices of higher index. So seq(G) has n — j zeros lying to the right of 
the k-th one from the right. 

To summarize, the /c-core of G has exactly j > vertices if and only if 
the right-most (n + k — j — 1) digits of seq(G) contain exactly k — 1 ones and 
n — j zeros, and the (n + k — j)-th digit from the right is a one. As there are 



no restrictions on the first j — k — 1 digits, there are 2 3 
such sequences. Therefore, 




P(|A;-core(G)|=j) = ^^( 




□ 
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